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Abstract 

In this paper, we complement Verdu's work on spectral efficiency in the wideband regime by investigating 
the fundamental tradeoff between rate and bandwidth when a constraint is imposed on the error exponent. 
Specifically, we consider both AWGN and Rayleigh-fading channels. For the AWGN channel model, the 
optimal values of R z (0) and R z (0) are calculated, where R z (l/B) is the maximum rate at which information 
can be transmitted over a channel with bandwidth B /2 when the error-exponent is constrained to be greater 
than or equal to z. Based on this calculation, we say that a sequence of input distributions is near optimal if 
both R z (0) and R z (0) are achieved. We show that QPSK, a widely-used signaling scheme, is near-optimal 
within a large class of input distributions for the AWGN channel. Similar results are also established for a 
fading channel where full CSI is available at the receiver. 

1 Introduction 

Communications in the wideband regime with limited power has attracted much attention recently. An important 
characteristic of such communication systems is that they operate at relatively low spectral efficiency (bits per 
second per Hz) and energy per bit. The advantages of communication over large bandwidth are many-fold: 
power savings, higher data rates, more diversity to combat frequency-selective fading, etc. Thus, it is important 
to understand the ultimate limits of communications in this regime from an information-theoretic point of view, 
and develop guidelines to design good signaling schemes. 
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Reliability Function for AWGN Channel with Infinite Bandwidth 
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Figure 1 : The reliability function for AWGN channel with infinite bandwidth 

Communications without a bandwidth limit, i.e., the available bandwidth is infinite, is well understood. For 
the additive white Gaussian noise (AWGN) channel, the capacity, measured in nats per second, converges to 
the signal-to-noise ratio (SNR) P/Nq of the channel when the available bandwidth B goes to infinity. Here 
P denotes the average power constraint at the input of the channel and Nq/2 is the power-spectral density of 
the Gaussian noise. Furthermore, a Gaussian signaling scheme is not mandatory to achieve this limit. Nearly 
all signaling schemes are equally good in the sense that the corresponding mutual information converges to the 
same value in the infinite bandwidth limit. For example, a simple on-off signaling scheme with low duty cycle is 
capacity-achieving in the infinite bandwidth limit. In [7|, Massey showed that all mean zero signaling schemes 
can achieve this limit. 

To establish a strong coding theorem, the reliability function E(R), as defined in @, of the channel has to 
be calculated for any coding rate R. Generally, the reliability function of a channel is difficult to compute and is 
known for all rates only for a few channels. Infinite-bandwidth AWGN channel is one of these channels and its 
reliability function has the following form fT5l l4l 

% -R 0< J R<%; 

2 - - 4 (1) 

where = P/Nq denotes the infinite-bandwidth capacity, as shown in Figure [2 We will show that when the 
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bandwidth is infinite, a large set of input distributions can be shown to achieve the optimal error-exponent curve. 
We will refer to such distributions as being first-order optimal. 

Naturally, the results in the infinite bandwidth regime can be considered as guidelines for designing signaling 
schemes in the wideband regime as well. However, in the wideband regime (when the available bandwidth is 
large, but finite), the result based on the infinite bandwidth calculations can be quite misleading. In [14], Verdii 
points out that to understand the performance limit in the wideband regime, two quantities need to be studied: 
the minimum energy per information bit (3^ . ) required to sustain reliable communication, and the slope of 
spectral efficiency (bits/s/Hz) at the point jfa . ■ If we treat C(-) as a function of b = 1/B, it is easy to see 
that studying these two quantities is equivalent to studying the optimal values of the following two quantities: 
infinite-bandwidth capacity C (0) and the first-order derivative of capacity with respect to b, (7(0) . In other words, 
we need to study both the infinite-bandwidth capacity, and the rate at which this capacity is reached. In 1 14*1. it 
is shown that, while many signaling schemes achieve C(0), only some of these reach the capacity at the fastest 
possible rate given by (7(0). We will refer to signaling schemes that achieve both C(0) and (7(0) as near-optimal 
input distributions in the wideband regime. Further, although (7(0) always has the same value for non-fading 
or fading channels with different CSI, (7(0) is determined by the CSI and can be very different for different 
channels. 

This paper complements Verdu's work and considers the relationship between probability of decoding error 
(represented by the reliability function), coding rate, and bandwidth for both AWGN channels and multi-path 
fading channels. Specifically, we study the maximum rate at which information can be transmitted over a channel, 
as a function of the available bandwidth, under a certain constraint on the reliability function. For AWGN 
channels, instead of characterizing the capacity C as a function of b = 1/B as in [ 14 1, we are interested in 
characterizing R z as a function of b, where R z is the maximum rate such that E{R Z ) > z and E(R) is the 
reliability function of the channel. In the infinite bandwidth regime, we characterize the optimal rate R z (0) 
with respect to a certain error-exponent constraint and study the conditions under which a signaling scheme can 
achieve this optimal rate. In the wideband regime, both R z (0) and R z (0) need to be considered. A signaling 
scheme which can achieve both R z (0) and R z (0) is said to be second-order optimal or near optimal with respect 
to an error-exponent constraint z. 

For fading channels, we use a doubly-block fading model where the available bandwidth spans multiple 
coherence bandwidth. If we let W c denote the coherence bandwidth, the total bandwidth of the channel is then 
assumed to BW C for some B > 1, Either a large B or a large W c can lead to a large total bandwidth BW C . 
However, these two regimes (the large B regime and the large W c regime) can have very different channel 
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behavior. Suppose we consider a wireless system with a total bandwidth of 10 MHz and if the delay spread is of 
the order of 1 //sec, then W c would be of the order of 1 MHz and thus, B is of the order of 10. In this paper, we 
focus on such a system where the coherence bandwidth W c is large and further, we assume a coherent channel 
model. By defining R z to be a function of 1/W C , we calculate R z (0) and R z (0). Similar to the AWGN case, for 
this channel model, we will show that QPSK can achieve both R z (0) and ^(0) and is thus near-optimal. In the 
other case where B is large, it may not be appropriate to assume any form of channel side information (CSI) and 
thus a non-coherent channel model is more suitable. We refer the readers to [ 16] for first-order asymptotic results 
for MIMO channels in this regime. 

This paper is organized as follows. In section|2l we will specify the channel models and formulate the problem 
that we wish to study. In section |3j we will show the main results for both AWGN channels and multipath fading 
channels. The proofs will be presented in section |4] and section |5] Section |6] contains concluding remarks and 
discussions. 



2 Channel models and problem formulation 

In this section, we will describe the channel models we use to study the behavior of both the AWGN channel 
and the multipath fading channel in the wideband regime. Further, we will formulate rigorously the problems we 
want to solve in this paper. 

2.1 AWGN channels 

We first consider a bandlimited AWGN channel with available bandwidth B/2 : 

y(t) = x(t)+w(t), (2) 

where w(t) is a complex symmetric Gaussian random process. We assume that we have an input power constraint 
P for the channel For notational convenience, we assume the noise power density iVo/2 = 1/2. Thus, the 
average power P also indicates the average SNR of the channel. We now sample the channel at sampling rate 
1/B, and represent it as a discrete-time memory less scalar channel as follows: 

y = X + W, (3) 

where w is a complex symmetric Gaussian random variable with variance 1, i.e., w G CN(0, 1). The power 
constraint for this discrete-time channel is 
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We want to study the asymptotic behavior of the communication rate R (nats per second) in terms of the available 
bandwidth B under this power constraint and an error exponent constraint, which is described below. 

Let P e (N, R, P, B) be the minimum probability of decoding error for any block code with codeword length 
./V seconds (or equivalently, NB symbols) and coding rate R. The error exponent at communication rate R (also 
called reliability function) of this channel is defined as 

E(R , P , B) lim _!^^). (5) 

We desire a lower bound for E(R, P, B) and denote it by Pz. (Without loss of generality, we scale the desired 
minimum value for the error exponent by P for mathematical convenience.) Let R z (b) denote the maximum 
possible rate at which communication is possible given this desired error exponent when the available bandwidth 
is B = 1/6. Since E(P, R, B) is a decreasing function of R, R z {b) is the solution to the equation 

E(P,R,l/b) = Pz. (6) 

Our goals for AWGN channels are two-folds: 

1. Calculate R z (0) and R z (0). 

2. Characterize the properties of first-order optimal signaling schemes, i.e., those that achieve R z (0). More 
importantly, find near-optimal or second-order optimal signaling schemes in the wideband regime such 
that both R z (0) and R z (0) can be achieved. 

In the rest of the paper, we drop the subscript and simply refer to R z as R. From the context, it should be 
clear that R is a function of z. 

2.2 Coherent fading channels 

In this section, we will explain the model we will use for a multi-path fading channel and formulate the problem 
in the wideband regime we want to solve for such channels. 

To characterize a multi-path fading channel, we use a doubly-block Rayleigh fading model. Specifically, we 
assume block fading in both the time and frequency domains. Further, we assume that we have a rich-scattering 
environment such that all the fading gains are Gaussian distributed. This model can be visualized as in Figure |2j 
where we divide the time-frequency plane into blocks of duration T c and bandwidth W c . We assume that the 
fading is fixed in each block and independent from one block to another. In each block, we can transmit W C T C 
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Figure 2: Doubly-block fading in time-frequency plane 

symbols, from the dimensionality theorem fi31. We let D = W C T C and refer to D as the coherence dimension of 
the channel. 

For this channel model, we can represent the channel by 

yi = H lXl + w l7 \<1<B, (7) 

where x;,vj, G C D . In other words, we have B parallel vector channels each with dimension D. Similar to 
the AWGN channel, we assume there is power constraint P (joule per second) for the fading channel, i.e., we 
have the following constraint on the input of the channel Q: 

B 

5>[|N| 2 ] <PT C . (8) 
1=1 

The doubly-block fading model is a simple approximation of the physical multipath fading channel. However, 
it retains most of the important characteristics of channels in a fading environment. For a derivation of such a 
model, we refer the interested reader to fTD. This model has been used in [9 1 to achieve the lower bound for the 
optimal bandwidth where spreading still increases non-coherent channel capacity. In [6|, Hajek and Subramanian 
use this model to calculate the reliability function and capacity for a non-coherent fading channel with a small 
peak constraint on the input signals. However, this model is simpler than the model used by Medard and Gallager 
151 . which allows correlation in both time and frequency blocks, or the model used Telatar and Tse [ 1 1 1, which 
allows correlation in frequency blocks. 
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In the wideband regime, we know the available bandwidth BW C >> 1 and the energy available per degree of 
freedom is small, i.e., -gpj^ << 1. Obviously, a large bandwidth can be a result of either a large B or a large W c . 
However, B and W c have different impacts on the channel performance and the asymptotic results in B and W c 
can be very different from each other and can lead to different conclusions. In this paper, we will focus on the case 
where W c is large. In this regime, we have large degrees of freedom in each coherence block although the energy 
per degree of freedom is small. Thus, we might still be able to measure the channel accurately and therefore, we 
assume a coherent fading channel model in this regime. However, to accurately illustrate the coherence level of 
this channel model from an error exponent point of view is still a research topic for now. We refer the reader to 
[ 17 1 for a discussion on the relationship between coherence level and coherence length from a capacity point of 
view. 

The ergotic capacity of such channels under full receiver side CSI is well known and is determined by the 
following expression 

\H\ 2 P 

C = BW C E H [ln(l + ' ' )] nats per second. (9) 

BW C 

The reliability function E(R, P, W c ) of this channel can be defined as below 

N^co l c iv 

where P e (N, R, P, W c ) is the minimum probability of decoding error for all block codes with codeword length 
NT C seconds and coding rate R (nats per second). 

Let R z (l/W c ) denote the maximum possible rate at which communication is possible given this desired error 
exponent E(R, P, W c ) > z. Our goal in studying this channel model in the wideband regime is still two-fold: 
calculate both R z (0) and R z (0) and identify signaling schemes that can achieve R z (0) and R z (0). 

3 Main results 

In this section, we will present our main results for AWGN channels and coherent fading channels in two separate 
sections without proof. Due to the technical nature of the proofs, we will present them in Section|4]and Section|5] 

3.1 AWGN channels 

We begin by first carefully describing the set of signaling schemes that we will consider in this paper. Due to the 
technicality in applying the sphere-packing bound (see Appendix 1X1 for a short review), we only consider input 
distributions with a finite alphabet. Specifically, we restrict ourselves to input distributions in the following set. 
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Definition 1 Define 



T>(p) = {q(x) : -E[|x| 2 ] = p; support of q(x) is a finite set of discrete points in C}. 



We impose the following additional constraint on the signaling schemes. 



Definition 2 Define Q(p) as a subset ofV(p), which satisfies the following properties 



SO) = {q P (x) G V(p) : \x 



max 



< K mP a .} 



(ID 



where \x 



max 



denotes the largest norm among all symbols of the input alphabet. K m and a are allowed to be 



any positive constants which are independent of p. 



o 



In other words, we constrain the input such that the largest-magnitude symbol has to decrease as B increases, 
although it can decrease at an arbitrarily slow rate. As we will show later, the choice of the parameters K m and a 
are not relevant to the result. Thus, K m can be an arbitrary large number and a can be an arbitrary small positive 
number, if we want to make the constraint mild. 

A signaling scheme is a sequence of input distributions, parameterized by B. For each B, we can only choose 
an input distribution from the set Q(P/B). 

Definition 3 We define J-(P) to be the set of signaling schemes, which are parameterized by B and satisfy 



By choosing signaling schemes from J^(-P), we are ruling out those peaky signaling schemes in which one 
of the input symbols remains constant or goes to oo, while the average power per degree of freeedom goes to 0. 

Under these constraints on the input distribution, we now specify the reliability function E(R, P, B) defined 
by © for AWGN channels. 

Lemma 1 Consider the discrete-time additive Gaussian channel (0 with bandwidth B /2 and input signaling 
schemes constrained by J~{P). Then the reliability function for this channel satisfies 



HP) = {{Qb(x)} : q B {x) G Q(P/B)} 



(12) 



where Q(P/B) is defined by Definition^ 



o 



E r (R, P, B) < E(R, P, B) < E sp {R, P, B) 



(13) 
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with 



E r {R,P,B) = sup -pR + BE {P/B,p), (14) 

0<p<l 

E sp (R,P,B) = sup-pR + BE (P/B,p), 

p>0 

E (P/B,p) = sup sup - In f ( f q(x)e /3 ^ 2 - p/B ) f w (y - x)^dx) + " dy, (15) 

qeQ{P/B)/3>0 J \J J 

where f w {x) is the probability density function of a complex Gaussian random variable CN(0, 1). 

Proof: This directly follows from the discussion on error exponent in Appendix|X] o 
Remarks: The most important fact here is that as we pointed out in Appendix 1X1 there exists a critical rate 
R cr it, such that for R > R cr it, the sphere packing bound and the random-coding bound coincide with each other 
and thus the random-coding exponent H41 with ([131 actually is the true reliability function. Based on this fact, if 
we only focus on this rate region, by characterizing the asymptotic behavior of (fi4l when B is large, we get the 
asymptotic behavior of the reliability function. In the following theorem, we obtain closed-form expressions for 
R(0) andi?(0). 

Theorem 1 Consider the discrete-time additive Gaussian channel with bandwidth B/2 and input signaling 
schemes constrained by J-{P). Let R(l/B) be the maximum rate at which information can be transmitted on 
this channel such that the following error-exponent constraint is satisfied: 

E(R,P,B)>Pz, 0<z<^. (16) 



We have 



and 



R(0) = lim R(l/B) = P(1-^) 2 , (17) 



r(0) = - P2(1 ~^ 3 , (18) 



Remarks: The constraint on z in (fTBT l arises from the fact that the reliability function is only determined for a 
certain range of z. Outside this range, the random-coding exponent is not necessarily tight. As we will show later, 
z = \ is the error exponent for R = R cr it in the infinite bandwidth limit. We now argue that for < z < \ , 
when the bandwidth is sufficiently large, the solution R(l/B) to (fTBT l will exceed R cr i t (\/B) and thus, the 
error exponent at R(l/B) is equal to the random-coding exponent. To be precise, we state this argument in the 
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following lemma and provide the proof in the appendix. It follows from this lemma that we can represent the 
reliability function by the random-coding exponent if we only consider z < 4 . 

Lemma 2 Let R r (l/B) be the solution to the random-coding exponent constraint E r (R, P, B) = Pz,for afixed 
z£ (0, j). For afixed z < \,we must be able to find a B z < oo, such that for all B > B z , R(l/B) = R r (l/B). 

Proof: See Appendix 151 o 
It should be noted that the constraints on the input signaling are not necessary to obtain the first-order result 
dl7t . In other words, introducing peakiness or allowing continuous alphabet symbols in the input distributions 
will not improve the error exponent in the infinite bandwidth limit for the AWGN channel. These constraints 
only play a role in obtaining the second-order terms in the expansion of R z (l/B) around 1/B = 0. 

A main goal of our study of the wideband reliability function here is to find good signaling schemes in the 
sense that they can achieve R(0) and R(0). To do that, we first define first-order optimality and near optimality 
(or second-order optimality) formally of a signaling scheme in the wideband regime, in a similar way as in [ 14 1. 

Definition 4 Consider a signaling scheme {(7b(x)} G F(P) parameterized by B. Let R(l/B) be the solution 
of 

Pz = E(R,q B ,P,B) (19) 

where E(R, qB,P, B) is the reliability function of the channel when the input distribution is fixed to be qs- This 
signaling scheme is said to be first-order optimal with respect to the normalized error exponent z, if 

R{0) = R(0). 

o 

Definition 5 A signaling scheme {(7b( x )} G F(P) is called second-order optimal or near optimal with respect 



to the normalized error exponent z if 

R(0) = R(0); (20) 

R(0) = R(0), (21) 

where R(l/B) is the solution to M9\ . o 



For AWGN channels, we obtain a sufficient condition for a signaling scheme to be first-order optimal. Then, 
we study the performance of two simple signaling schemes as in [ 14]: BPSK and QPSK. Specifically, when we 
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Figure 3: The maximal rate R for BPSK and QPSK for a fixed normalized error exponent z = 0.1. 

say BPSK or QPSK, we mean the following. Let p = P/B be the available power per degree of freedom. For 
BPSK, we choose the input to be either y/p or — A yp with equal probability; for QPSK, the input alphabet consists 
of W§ (1 + j), J |(1 — j), \j\ (— 1 + j), and a /| (— 1 — j), all chosen with equal probability as well. 



Theorem 2 For AM7iV channels, all signaling schemes in J~(P) which are symmetric around are first-order 
optimal for any given z £ (0, |). 77ii«, Z?o?/z BPSK and QPSK are first-order optimal; however, only QPSK is 
second-order optimal. o 

Remarks: From this theorem, we know that it does not take much for a signaling scheme to be first-order 
optimal. This result is consistent with the capacity result shown by Massey in (7). 

To get a better feel for how differently BPSK and QPSK behave in the wideband regime, we plot R as a 
function of 1 /B for both BPSK and QPSK in Figure^ As shown in Figure^ as B -> oo, both BPSK and QPSK 
can achieve the optimal rate R(0). However, only QPSK can achieve R(0). 

Another way to understand the difference between the performance of BPSK and QPSK is to study the 
fundamental tradeoff between spectral efficiency and energy per information bit (E^/No), as suggested in fl4l . 
We plot this tradeoff in Figure@] From this figure, we can see that both BPSK and QPSK can achieve the optimal 

TP TP 

. , however, only QPSK can achieve the optimal spectral efficiency slope at the point . . 
As compared to Figure 2 in fl4ll . the major difference here is that ^ . in Figure|4]is around 3.3dB higher, 
since we have a more stringent constraint than just reliable communications, as considered in 1 141. Mp- here 
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Figure 4: Spectral efficiencies achieved by QPSK and BPSK in the AWGN channel, when the error exponent is 
constrained by z = 0.1. 

denotes the minimal energy per information bit such that the probability of error has to decay faster than e~ Nz 
as the codeword length N increases. 

3.2 Coherent fading channels 

Next, we consider coherent fading channels. As in the case of the AWGN channel, we first describe our assump- 
tions on the input signaling schemes. 



Definition 6 Define Q^ c (P) to be the set of joint input distributions onX = (xi, X2, 
1, 2, ■ • • , B} are vectors with dimension D = W C T C , which satisfy the following 

1. the average power constraint © is satisfied; 

2. the distribution has a discrete alphabet, consisting of finite number of symbols; 



,xb), where {xi, I 



3. each symbol can be chosen from a given set SS? . The set of symbols S^r is defined as follows: 



S^ c = {X = {xi,x 2 ,---,xb} :x, eC u - max \x ld \ <K m W~ a V/ = 1, 2, • • • , B}, 

d=l,2,---D 

where K m and a are allowed to be any positive constants independent ofW c . 
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(22) 



The signaling schemes of interest to us are defined as follows. 

Definition 7 We define Tyy (P) to be the set of signaling schemes, which are parameterized by W c and satisfy 

FwS P ) = {{^ C (X)} : qwA*) G Qw c ( P )} > (23) 
where Qw c {P) was defined in Definition® o 

The reliability function for our discrete-time channel model Q with signaling schemes constrained by 
Tyy (P) can be computed according to the following lemma. 

Lemma 3 Consider the coherent fading channel model @ with H known at the receiver. Assume that the input 
distribution satisfies the average power constraint © and the constraint in in Tyj c ( P ) • ^ ne reliability function 
E(R, P, W c ) satisfies 

E r (R, P, W c ) < E(R, P, W c ) < E sp (R, P, W c ), 

with 

E r (R,P,W c ) = sup -pR + E (P,p,W c ), 

0<p<l 

E sp (R,P,W c ) = sup -pR + E (P,p,D), 

E (P,p,W c ) = sup sup-^ln^ f( f q(X)e^ x ^- pT ^f(Y\X,H)^dx) 1+P dY. (24) 
q eF* c (p)P>o 1 c J \J ) 

Proof: We can apply Theorem[^]and Theorem^Jfrom Appendix lAl here to this channel model by viewing the 
channel as a memory less channel with output Y = {Y, H}. The fraction of in d2"4l is to balance the scaling 
since the rate R here is defined to be nats per second. o 
The constraint on the error exponent is 

E(R, P, W c ) > z, (25) 

and we need to solve for R(0) and -R(O) where R is a function for ^ for a fixed B. We have the following 
theorem. 

Theorem 3 Consider a coherent Rayleigh-fading vector channel (0 with the input signaling constrained by 
J-yy c {P)- Let R(l/W c ) be the maximum rate at which information can be transmitted on this channel such that 
the following error-exponent constraint is satisfied: 

E(R, P,W c )>z, 0<z<z*, (26) 
13 



where z* is defined as follows 

B PT P 

Z '-T c ^ 1 + ^W ) - 4 + 2PT c /B - <27) 

We have 

z i 51nfl + 7 #^ T ") 

R(0)= lim R B (1/W C )= sup — + =■ ^ m±£l±, (28 ) 

Wc-*°o 0<p<l P J-c P 

and 

P 2 

R(0) = twt — , (29) 

B(i+p)(i+ P * + ^y 

where p* is the optimizing p in (|28J. o 

The constraint on z in (126 1 again comes from the fact that the reliability function is only known when R > 
Rcrit- Now we show that z* given by dTTt is the corresponding error exponent at R cr u when W c goes to infinity. 
From the property of the critical rate R cr it, we know the optimizing p in d2lfl ) at the corresponding error exponent 
z cr u is 1 . Thus, taking derivative of the right side of d28t with respect to p, we must have 



B ln (! + W) , B PTJB 1 



P 2 T e P 2 +T ^(1 + ^tB))( 1 +^ Ip=1 °" 

By solving this, it is straightforward to have z cr u = z* with z* determined by (127 1 . The corresponding rate R cr it 
can be obtained as follows 

Rcrit = —Zcrit + Tpr m (l + TTFt) 
i c IB 

P 



4 + 2^' 



Using a similar argument as in the AWGN channel case, we can argue that for z G (0, z* ) , the reliability function 
coincides with the random-coding exponent for sufficiently large W c . Thus, the calculation of R(0) and R(0) 
can be carried out by using the random-coding exponent. 

Another observation here is that the applicable region (in terms of R), where the random-coding exponent 
coincides with the sphere-packing exponent, actually covers most of the rate region from to capacity, when the 
available energy per coherence block —j^ is fairly large. To see this, we first notice that as W c goes to infinity, 
our capacity in © is P. Thus, the critical rate R CT u can be also written as — Vtt^oo- When ^& is large, 

4+2 -g^- 

we have R cr u << Coo- This observation is also shown in Figure |5] For simplicity, we choose B = T c = 1 in 
this numerical example and choose P = 100. 

Next, we need to identify those signaling schemes which can achieve R(0) and i?(0). Again, we consider 
BPSK and QPSK signaling. However, for the fading channel (Q, these two signaling schemes have slightly differ- 
ent meanings than what we defined in last section for AWGN channels. Specifically, for both BPSK and QPSK, 
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Figure 5: The error exponent curve from R cr u to capacity for the channel with infinite coherence dimension. 
B = T c = 1. P = 100. 

we spread the available power in each coherent block equally among all the time-frequency coherent blocks and 
make the distributions in each dimension i.i.d. For BPSK, the symbols for each dimension are y/P/BW c and 
-y/PjEWc, with equal probability. For QPSK, the symbols are ^/^^(l+j), \p3w;( 1 -j)> \J 
and J ' 2bw ~~ ■?')• Similar to the AWGN case, we have 

Theorem 4 Both BPSK and QPSK are first-order optimal for any given z G (0, z*); however, only QPSK is 
second-order optimal. o 

3.3 Implications and discussion 

The results that we have obtained for both AWGN channels and coherent fading channels are consistent with the 
results from a capacity point of view in the seminal work [ 14 1. By letting z go to 0, the quantity R z becomes the 
capacity of the channel. Thus, it can be easily checked that by taking z to be 0, we can recover the capacity results 
by using the expressions in Theorem Q and Theorem |3] However, we also have to point out that in [ 14], a very 
general treatment is provided for a much broader class of channel models. In this paper, due to the complexity 
of the calculation of the reliability function, we only calculated the first and second order rate approximation for 
two very specific channel models. 
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Despite the similarity between our results and Verdu's results regarding near-optimal signaling, the fact that 
QPSK is still near-optimal under a certain error exponent constraint is still somewhat surprising because of 
the following reason. In general, very little is known about the conditions under which an input distribution 
achieves the optimal error exponent at a given rate, even in the infinite bandwidth limit. It is not necessarily 
true that capacity-achieving distributions are also optimal from an error-exponent point of view. One example is 
the infinite-bandwidth non-coherent Rayleigh fading channel, which is studied in llT6l . Thus, it is not obvious 
that actually QPSK can do well in the wideband regime from an error exponent point of view,even though it is 
wideband optimal from a capacity point of view. 

4 Proof of Theorem fl] and Theorem |2] 

Due to the technical nature of the calculations needed in the proofs of our main results, we first summarize the 
proof steps as follows to help the reader follow the proof of our main results. 
The proof of Theorem[2can be broken down into the following major steps: 

1. We first relate the problem of finding R(0) and R(0), where R is the communication rate per second as a 
function of 1/B, to the problem of finding r(0) and r(0), where r is the communication rate per degree of 
freedom in © as a function of p, which denotes the SNR per degree of freedom. 

2. The calculation of r(0) can be related to the optimal value for E Q in the infinite bandwidth limit; an upper 
bound is derived for E a using a simple inequality; this bound is further shown to be achievable; 

3. r(0) can also be related to certain derivatives of E ; a better upper bound is derived for E Q which yields an 
upper bound for r(0); this bound is also shown to be achievable. 

The next several subsections will prove the main results following these three steps. 
4.1 Communication rate and error exponent per degree of freedom 

It is shown in [ 14 ] that the capacity C in a bandlimited channel with limited available power P, but large available 
bandwidth B, can be related to the capacity c in a scalar channel with small available power p = P/B. Thus, 
the problem of finding optimal (7(0) and (7(0) can be shown to be equivalent to the problem of finding optimal 
c(0) and c(0). The relationship between (7(0) and c(0) is also extensively studied in an earlier paper [ 13 1, where 
the notion capacity per unit cost was studied. We first show that a similar connection can be made between the 
error-exponent constrained rates R (nats per second) and r (nats per symbol). 
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Theorem 5 Consider a scalar Gaussian channel y = x + w with average power constraint p. Further, the 
signaling schemes are constrained by T{p) = {{<lp(x)} '■ Qp(%) S Q(p)} ■ Let r be the maximum rate per symbol 
at which information can be transmitted through channel @ such that the error exponent satisfies 

E(r,p) > pz, < z < -, 

where E(r,p) is the error exponent per symbol of the scalar channel with power constraint p. Consider r as a 
function of p. Let R (nats per second) be defined as the solution to M6\ . We have 

R(0) = Pf(0); 

Proof: It is easy to check that 

E(R, P, B) = BE(R/B, P/B). 
Denoting r = R/B and p = P/B, the original error-exponent constraint can be rewritten as 

E r (r,p) > pz. 

Using these two relations and considering R as a function of b = l/B, we have 

R(0) = lim Rib) = lim ^ = P lim ^ = Pf(0) (30) 
b^O b-*0 b b^O p y ' 

w b^o b b^o b 2 v 

o 

Thus, the original problem of finding R(0) and R(0) in the wideband regime is equivalent to finding the 
optimal values for r(0) and r(0), given a constraint on the reliability function E(r,p) > pz. In the rest of this 
paper, we will deal with this scalar channel problem. For notational convenience, we use E(r,p) to denote the 
error exponent per symbol of the single channel instead of using E(r,p). 

4.2 Optimal value of f(0) 

We know for the error-exponent constraint in the range of (0, 4) and p sufficiently small, we have 

E(r,p) = E r (r,p) = sup -pr + E Q (p,p), 

0<p<l 
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where 



E a (p, p) = sup sup — In 



q v {x)e^ x \ 2 - p) f{y\x)^dx ) d,,. 
Thus, the constraint on the error exponent can also be written as 



pz = sup —pr + E (p, p). 

0<p<l 

The first result in the first-order calculation is the following lemma. 
Lemma 4 For any p £ [0, 1], E a (p, p) is upper bounded by 



(32) 



(33) 



E (p,p) < 



PP 



Proof: For notational convenience, define a(y) to be 

a (y) = J q p (x)e^ 2 - p) f(y\x)^-pdx 

and M(y) as 

M(y)= f qp (x)e^^ mx) 



i 

i+p 



(34) 



(35) 



(36) 



J(y\o). 

Here f(y\0) denotes the distribution function of y conditioned on that the input is 0. It is easy to see that /(y|0) 
is simply the distribution of the Gaussian noise f w (y) . Then we have 



E {p,p) = sup sup -In / a(y) l+p dy 
qef(p) £>o J 



sup sup -In / f w (y)M(y) 1+p dy 

q&F(p) /3>° J 



q&P(p) P>° 

sup sup — (1 + p) In Eg 

q£F(p) /3>0 



1+p 



< sup sup- In ( / f w {y)M(y)dy 

M'-p) J f w{y )Tf- pf{y \ x) Tbdy 
Mx\'-p) f f w (y)Tf- pfw ( y - X )Tbdy 



= sup sup — (1 + p) In E q 

qdF{p) P>° 

= sup sup —(1 + p) In Eg 
< sup sup — (1 + p) In e~ 6 

qeP(p) /3>0 
PP 



e P(.\x\ 2 -p) e -8\*\ 2 



(37) 
(38) 
(39) 
(40) 

(41) 
(42) 



1 + 



where 6 in d4Tl > is defined by 



(l + />) 2 ' 
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The inequalities in ( I38t and d42l are simple applications of Jensen's inequality. o 
The next theorem establishes an alternate expression for the error exponent constraint d33l) . 

Theorem 6 The error-exponent constraint \33\ implies the following relationship between r and z 

pz E (p,p) 

r = sup 1 . (43) 

0<p<l P P 

Proof: See Appendix IH o 

Since we want to study the first and second-order derivative of r with respect to p in the low SNR regime, it 

is more convenient to use d43t . To obtain the first order derivative, from d43l we first note that 

r z E a (p,p) 
- = sup 1 . 

P 0<p<l P PP 

Now we relate r(0) to the first partial derivative of E (p, p) with respect to p. 

Theorem 7 If as p — > 0, the limit of Eo ^' p ^ exists for any p 6 [0, 1], which is denoted as E o (0, p), and further, 

E (p,p) 4(0, p) 



PP 

we have 



uniformly for p G [0,1], 



r{0) = sup 1 . (44) 

0<p<l P P 

Proof: From the definition of uniform convergence, for any e > 0, we can find 5(e) > 0, such that for any 
p < <5(e), we have 

E (p,p) E o (0,p) _ 

< e, Vp £ [0, 1J. 



PP P 

Thus, if we denote K = sup 0<p<1 — ^ + Eo ^' p ^ , we have 

r{p) : , E o (0, p) 

< sup i h e = K + e. 

p 0<p<l P P 

Similarly, we can show that > K — e. Letting e — > 0, we have r(0) = lim p ^o = -f- 



Lemma 5 As p — > 0, °p p P converges to uniformly for p € [0, 1]. 
Proof: In Lemma 0] we have already shown that 

£o(P,P) < 1 



PP 1 + p 

In Appendix|]J we will show that when the input distribution is chosen to be BPSK or QPSK, E °^ p '^ converges 
uniformly to . Since ^"^'^ is lower bounded by ^^'^ , the lemma follows. 
Using Lemma|5]and Theorem^ we can compute r(0). 
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Proposition 1 For < z < \ , 



Proof: From Theorem Q we have 



r(0) = (1 - ^~zf. (45) 



r(0) = sup - Z - + (46 ) 

0<p<l P P 

z 1 
sup h 



0<p<l P 1 + P 



(47) 

|-je \<z<l. 

For < z < \ , the optimizing p* = . o 
Note here the optimal value r(0) is obtained by optimizing over all input distributions in Tip). However, this 
result is valid for all input distributions. In other words, allowing continuous alphabet or peaky signaling would 
not change this optimal value. This is due to the well-known infinite bandwidth AWGN channel error-exponent 
result, which is shown in (Q. It can be easily seen that d45t is simply the inverse function of ©. The purpose 
of deriving r(0) using the constraint Tip) is not to just derive d43t . but also to obtain conditions on the input 
distributions in Tip) which achieve d45l . We will obtain such conditions in the next subsection. 

4.3 First-order optimality condition 

Next we study conditions for a sequence of input distributions to be first-order optimal. 

Lemma 6 Assuming < z < \, a sufficient condition for {q p } £ Tip) to be first-order optimal is that 

ton ^tort = (48) 
p^o p 1 + p* 



where p* = z v 7 = . 

Proof: If lim p ^o Eo{p ' qp ' p * ) = we have 



,. . f f . r . f z . Eq(p,Q p ,P*) 

lim mi — > lim mi 1 

P^O p p^O p* pp* 

= -± + li m Eoip,q P ,P*) 



p* P^O pp* 

_il i 1 
p* 1 + 

il~V~z) 2 . 
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On the other hand, from Lemma|4] we know 



v f z E (p,q p ,p) 
lim sup - = lim sup sup 1 

p^o P P ^o o< P <i P PP 

S T Z , E o(p,P) 

< limsup sup 1 

p->o o<p<i P VP 

i , z 1 

< limsup sup 1 

P-+0 0<p<l P 1 + P 

= (l-^) 2 - 



Thus, the limit of £ exists and we have 



f(0) = lim - = (1 - y/zf 
p^o p 



Actually, it does not take much to be first-order optimal. 

Lemma 7 For a fixed < z < \ , a sequence of input distribution q p € F(p) is first-order optimal if it is 
symmetric around 0. 

Proof: Refer to Appendix IHI o 
4.4 The optimal value of r(0) 

In this section, we will find an upper bound for r(0) and later we will show that this value is achievable. To do 
this, we first connect r(0) to the second partial derivative of E (p, p) with respect to p. 

Theorem 8 Assume the second partial derivative of E Q (p, p) with respect to p at p = (denoted as E o (0, p)) 
exists for any p £ [0, 1]. Further, assume that 

E (p,p) _ E o (0,p) 



pp p E o (0,p) 



uniformly for p G [0, 1], 



P 2 P 

and E °(®' p ^ is a continuous and bounded function of p for p £ [0, 1]. Then r(0) can be determined by 

p* 

where p* is the optimal p in H441 and is equal to 1 S [^ ■ 
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Proof: First we show that 

r(p)-pr(0) 4(0, P*) 
r(0) = km sup yj- < - . 

The uniform convergence gives us: for any e > 0, we can find 77(e) such that for all p < 77(e), 

Eo(p,p) _ Eo(0,p) 



PP 



K(0,p) 



p 



2p 



<e for all p G [0, 1] . 



In other words, for p < 77(e), we can write 



From d43t . we have 



E (p, p) < E o (0, p)p + E o (0, p)p 2 /2 + pep 2 . 



, pz E o (0,p)p + E o (0,p)p 2 /2 2 

r(p) < sup 1 h ep . 

0<p<l P P 



(50) 



Assume p(p) is the optimizing p for d50l) . From the first-order calculation, we already know that 



4(0, p) 



P 



1 + P 

Since the optimization in d50l is performed over a compact set [0, 1] and by assumption E o (0, p) is continuous 
in p, the optimizing p must exist. 
We must have 



From (l44l . we know 



This gives us 



f , , f Pz , p4(0,p) l , 4(0, p(p))%- , 2 

r(p) < < sup 1 > H — + ep . 

\o<p<i P P I pip) 



■ / n x P z , p4(0,p) 
r(0)p = sup 1 . 

0<p<l P P 



r(p)-pr(0) < £ o (0,p(p)) 2c 



Letting e go to 0, we have 



P 2 /2 



pip) 

r(p) — pr(0) 



r(0) = limsup 

P^/2 



< lim sup 



4(0, p(p)) 



p^o pip) 

4(o, p*) 
* ' 



(51) 
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where p* is the optimizing p of (l50l as p goes to zero, and can be shown to be equal to yr^j • The last equation 
(13Tb can be easily verified given that E °^ ) ' p - > is a continuous function of p, if we have lim p ^o p(p) = P* > which 
we will show in Appendix iDl 

To complete the proof of the theorem, it suffices to show 

_v ' P ^o p 2 /2 - p* 
To see this, we choose p = p* in d50l and we have 



,,2 



From (HU, we must have 



and thus, we have 



Letting p — > 0, we will have 



r(p) > — - + + — - - ep . 

p* p* p* 



■ft* z _l E °^P* 
r (°) = -^ + 



p* p* 



r(p)-pr(0) E o (0,p*) ^ep 2 

p 2 /2 ~ p* p 2 



HO) > &(0 '"* ; 



o 

Thus, to obtain the optimal value for r(0), we need to verify the uniform convergence assumption in Theo- 
rem[8]and calculate Eo ^f '- . To show uniform convergence, we both upper and lower bound 

Eo(p,p) _ E o (0,p) 

VP P 

p 

by a function of p plus a small term <5(1), which converges to uniformly for p 6 [0, 1], as p goes to 0. Specifi- 
cally, we want to show that when p is small, we have 

E °^ P) + *(l) < ~ ~ < E °^ p) + 8 2 (l), 

2p p 2p 

where both <5i (1) and ^(l) converge to uniformly as p goes to 0. The uniform convergence of 

Eo(p,p) _ E o (0,p) 
VP P 



P 

follows easily from here. We will first show an upper bound, then we will obtain a lower bound by using QPSK 
signaling at the input. In the rest of the paper, we will use the notation 5(p m ) to denote a term satisfying that as 



p goes to 0, S ^ m — * uniformly for p 6 [0, 1]. 
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We know that 

E (p,p)= sup E (p,q p ,p). 

However, it is easy to see that we will not lose any optimality if we constraint ourselves to those input distributions 
which perform at least as good as QPSK. In other words, we have 

E {p,p)= sup E (p,q p ,p), (52) 
{g P }e<?(p) 

where Q{jp) is defined as 

Sip) = {tip} e Hp) : E (p,q p ,p) > E (p,QPSK,p),Vp > o} (53) 
Lemma 8 For any sequence of input distributions {q p (x)} £ G(p), 

gy-gy ^ - inf { 9P } e g( P )^>o/«^ 1+ ^+ e "^ - (54 
p ~ pp 2 

Proof: See Appendix |E| o 
Next, we further bound J a(y) 1+p dy for any sequence of input distributions {q p } G C7(p). 

Lemma 9 For a// all (3, we have 

Ja(y) 1+ Pdy = J f w {y){\ +T{y)) l+P dy 

> l + (l+p)J Uy)T{y)dy + E^±£l J f w {y)T\y)dy + p{1 + p) Q {p - 1} J f w (y)T 3 (y)dy, 

(55) 

where T(y) = M(y) — 1 and M(y) jj defined by \36l . 

Proof: The following inequality is true for all t > — 1 and all p 6 [0, 1] : 

(1 + > 1 + (1 + p)t + P(1 + P) t 2 + P(1 + P)(/? ~ 1} t 3 . 

2 6 

Using the fact that 



and plugging in the above inequality, we have (I55t . 

We will now treat the three terms separately in d55l and find a bound for each of them. 
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Lemma 10 



where 9 = j^fa. 



J f w (y)T(y)dy > - 1, (56) 



Proof: It is easy to check 

' Uy)T{y)dy = £[ e /*(M 2 -P) e -*W 2 ] - 1. 



Applying Jensen's inequality here, we get d56t . o 

Lemma 11 For any input distribution {q p (x)} 6 G(p), let f3* be the optimizing f3, which maximizes 

sup -In f a(y) 1+p dy. (57) 
3>0 J 



/3>0 

We have 

" f w (y)T 2 (y)dy 



>*y+ 7T ^ i +<V). 

/3=f3* (i + pr 



Proof: See Appendix |F| o 
For those input distributions in G{p), the term with integral over T 3 (y) actually does not contribute anything 
to the second-order calculation, which is shown in the following lemma. 

Lemma 12 Suppose that {q p (x)} £ Q{p)- We have 

Uy)T\y)dy = 5(p 2 ). 

Proof: See Appendix |G| o 
With these results, it is straightforward to show the required uniform convergence. 



Proposition 2 

E {p,p) _ E o (0,p) 



VP P 



P 2(1 + pY 

as p goes to 0. 



uniformly for pe [0,1], (58) 



Proof: Combining Lemma[TT]and Lemma[T2l we have 

Ja(y)^dy > 1 - -^-p + (1 + p)*Y/2 + + j + p5(p 



2\ 



11 1 

P O p pp n. 



l + p 2(1 +pY 2(l + p)3 
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Applying Lemma[8]here, we can obtain that 

E {p,p) E o {0,p) 



1 



Later, we will show that by choosing the input distribution to be QPSK, we can establish a lower bound which 
has the same expression as the upper bound. Thus, we know d58l is true. o 
Since we know p* = yz^> the following corollary is a direct consequence of Theorem[S] 

Corollary 1 For < z < \ , we have 

f(0) = -(l-^) 3 - (59) 



4.5 BPSK and QPSK 

Combining the results regarding r(0) and r(0) in the previous subsections and Theorem [5] we have proved 
Theorem ^ Regarding Theorem |2j the first part of the theorem is a direct consequence of Lemma which has 
already been proved. For the second part of the Theorem regarding BPSK and QPSK signaling, we can again do 
the calculations in a scalar channel with small power as we have proceeded with the proof of Theorem Q The 
calculations are rather straightforward and we put the detailed proof of this part in Appendix U] for completeness. 

5 Proof of Theorem |3] and Theorem |4] 

In this section, we will prove Theorem[3]and Theorem@J For simplicity, we only prove the case for B = 1, i.e., 
we focus on one of the B parallel channels in the channel model (Q. The extension to the general case with B 
parallel channels is quite straightforward. Since B = 1, we drop the subscript of I in (0 and we have 

y = Hx + w. (60) 

We assume the average power available in each block is PT C , i.e., 

£[||x|| 2 ] = PT C . (61) 

Thus, the energy per degree of freedom is which is small when W c is large. 

In this proof, we will use the results for AWGN channels extensively. To avoid confusion in the notation, we 
will use a superscript "NF" (Non-Fading) to denote any quantity that was computed for the AWGN channel. 
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5.1 R(0) and first-order optimal condition 



In the near capacity region (R > R cr it), where the random-coding exponent and sphere-packing exponent are 
tight, the reliability function constraint can be written as 



sup -pR + E (P, p, W c ) = z, 

0<p<l 



and 



E (P, p,W c ) = — sup sup - In Eh 

L c q£F Wc (P) f3>0 



/3(||x|| 2 -PT c ) 



f(y\x,H)—p dx) 1+ "dy 



(62) 



Similar to the AWGN case, we first show that E a (P, p, W c ) is always a bounded quantity. 
Lemma 13 For any p G [0, 1], 

0<E o (P,p,W c )<±Hl + ^). 

T c 1 + p 

Proof: The lower bound is easy to show from (162b . using a similar approach as in the AWGN case: 

J(J q(x)f(y\x,H)^dx) 1+ Pdy 

J(Jq(x)f(y\x,H)dx)dy 



T c E (P,p,W c ) > sup -In E H 
q eF Wc (P) 



> sup — In Eh 

qeFw c (P) 
= 0. 



(63) 

(64) 
(65) 



The inequality in d6lT l comes from taking (3 = and the inequality in do3l follows from Jensen's equality, by 
noticing that t 1+p is a convex function. 

To show the upper bound, we move the two supremums inside the expectation over H : 



T c E (P,p,W c ) < -In E H 



inf inf /(/ q H (^)e^^ 2 - pT ^f(y\x,H) — d*) 1+p dy 



Now for each realization of H, we choose the best qn (x) and f3 to optimize the integrand in the equation above. 
This is the same as finding the optimal g(x) and j3 in an AWGN vector channel with a fixed gain H. Thus, 
we do not lose any optimality by choosing g(x) to be i.i.d. in all components of the vector. Denote qn{x) = 
U-^qnixi), and we have 



T c E (P,p,W ( 
< -lnE H 



inf inf 



lnE H 



( J q H {x)e Mx? ~^ ) f{y\x,H)^dx) 1+p dy 
mif3 H > D)n(j(J q H (x)e l3H(lx]2 ~^ ) f(y\x,H)^hdx) 1 +''dy 



D 
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h\E H 



\nE H 



^(|H|2|.|2-W 



} w (y-Hx)T+?dx) l +Pdy 



(66) 



where E^ F (p, p) denotes the E a for a scalar non-fading (AWGN) channel, 

E^ F (p,p)= sup sup-In [{[ q(x)e^ 2 -^f w {y-x)^dx) 1+p dy. 

g(x)eJF(p) /3>0 •/ «/ 

Here denotes the probability density function of a symmetric complex Gaussian random variable with unit 
variance. 

In last chapter, we have already shown that 



Plugging this into d6*6T l. we get d6*31 ). o 
With this upper bound, we can find the following equivalent form of the error-exponent constraint, which is 
easier for us to work with. 



Theorem 9 An alternative form of the error-exponent constraint is 

R[\/W c ) = sup 1 

0<p<l P P 

Proof: Similar to the proof of Theorem[6] 



(67) 



Corollary 2 In the equivalent form of the error-exponent constraint \67\ . we can restrict p to be in interval 
[-p, 1], without losing any optimality. In other words, 

7. EJP.n.W.\ 

(68) 



pn /w v z E (P,p,W c 

R(l/W c ) = sup ! 

#<p<i P P 



Proof: Note R(l/W c ) is the maximum rate such that the error-exponent constraint is satisfied. For a reasonable 
choice of z, (we will discuss later about the range of z that we are interested in,) the supremum in d67b must yield 
a non-negative result. Thus, we can restrict ourselves to the p such that E a (P, p, W c ) > z. Applying LemmafO] 
here, this further implies 

lln(l + ?^)>*. 



1 + 



Noticing that ln(l + ^y^p ) — *TFp — pPT c , we have pP > z. Thus, we only need to perform the optimization 
of p in the interval [p, 1]. o 
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Since we are studying the behavior of R(l/W c ) at large W c for a fixed z > 0, the range of p in d68t excludes 
0, which will be quite helpful in the calculations of R(0) and 22(0), as we will show later. 

To find the value of R(0) = limv^ c - >0 o 22(1/W / C ), an operation of exchanging the order of supremum and 
limit is involved. We need the following theorem to justify this operation. 

Theorem 10 If as W c goes to infinity, for any p £ [0, 1], the limit of E Q (P, p, W c ) exists, which is denoted as 
E Q {P, p, oo), and further, E a (P, p, W c ) converges to E a (P, p, oo) uniformly for p G [0, 1], we have 

22(0) = sup ! . (69) 

0<p<l P P 

Proof: Uniform convergence of E (P, p, W c ) gives us the following: for any e > 0, we can find Wj- € \ such that 
for any W c > W^ 1 , we have 

\E (P, p, W c ) - E (P, p, oo)| < e, for all p G [0, 1]. 

From d68t . we know for W c > Wc € \ 

R(l/W c ) < sup 1 

f< P <l P P 

z E (P,p,oo) Pe 
< sup 1 1 . 

T><P<1 P P Z 



Similarly, we can show that 



R{l/W c ) > sup 1 . 

T<P<1 P P Z 



From here, it is easy to see that 



22(0) = lim R{l/W c )= sup -- + P ' °° ) 



Wc—oc |<p<l P P 

The supremum over [4, 1] and [0, 1] can be shown to be equivalent using a similar argument as in the proof of 
Corollary |3 Thus, d69l must be true. o 
The uniform convergence can be easily established if we can find a lower bound for E Q (P, p, W c ) which 
converges to E (P, p, oo) uniformly, since we have already obtained an upper bound in Lemma [O] We will use 
a widely-used signaling scheme, QPSK signaling, to establish a lower bound for E (P, p, W c ). Later, we will 
discuss the optimality of QPSK and the lack of optimality of another widely used signaling scheme, BPSK, in 
the wideband regime. 
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Lemma 14 When the coherence dimension W c goes to infinity, 



E (P, p, W c ) -» ^ ln(l + uniformly for p G [0, 1] 



1 + P' 



Proof: Because of d63l . it suffices to show that for any e > 0, we can find W c (£) , such that 

E (P,p,W c )> ±ln(l + ^)-e, 
T c 1 + p 

for any W c > W^ and for all p 6 [0, 1]. 

From the definition of P (P, p, W c ), we know for any specific choice of {q*} £ P^ (P), we have 



(70) 



where P (P, (/*, p, W c ) is defined as follows 

E Q (P, g* , p, W c ) = ^ sup - In P H 

/3>0 



(/<f(x) e 



/3(||x|j 2 -PT c ) 



/(y|x,P) — dx) 1+ 'dy 



(71) 



Now we choose q* to be QPSK. Since now ||x|| 2 = PT C with probability 1, the power-constraint parameter 
j3 does not affect E a (P, QPSK, p, W c ) and we have 

,P|P| 2 



1 



E (P, QPSK, p, W c ) = -— In E H 



expi-DEi 



NF, 



W n 



,QPSK,p)} 



where E^ F (p, QPSK, p) is 



(72) 



E? F (p,QPSK,p) = - In / E x [f w (y - x)—p]^dy. 



Next we show that for any e > 0, we can find Wc , such that 



P (P, QPSif, /j, W c ) > ^ ln(l + ) - ( . 



From (T72t . it suffices to show that 



< 1 + r^ )£ " 



exp{-PP^ 



.P|P| 



i + p' 



;QPSK,p)} 



< e 



C-T r 



(73) 



In last section, we have already shown that as p — > 0, — — ^ P ^ FSK ' P ^ _ > _L_ uniformly. In other words, for 
any e' > 0, we can find £ > 0, such that for all p < £, 



1+P 
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or equivalently, 

Note that 

Eh 

= Eh 

< E H 

< E H 



E^ F (p,QPSK,p) > 



VP 
1 + P 



e'pp, for all p £ [0,1], 



(74) 



expi-DE^ 



NF 



,P\H\ 

' Wr. 



QPSK,p)} 



\H\* < ^ 



P 



+ E H 



-DE?F(?P-,QPSK,p) 



W? > (W ° 



p 



-D(-£ e'o) P|H| 

y l+P P) W c 



W? < (W < 



P 



+ Pr{\H\ 2 >^) 



P 



+ Pr{\H\ 2 >^)- 



(75) 
(76) 



The inequality in d75b comes from d74l and the fact that E Q (p, QPSK,p) > 0. For Rayleigh fading, we can 
compute dTol i and we have 



Eh 



exp{-DEZ 



NF 



.P\H\ 

~w7 



-,QPSK, P )} 



< 



1 + (ife - *P)PTc 



+ e p 



We choose e' such that e' = jp- We can then find the corresponding £ with respect to this choice of e'. We 



then choose wi^ such that 



e p < 



It is straightforward to check that for all W c > Wc e \ (l73t will be held and thus complete the proof of this 
Lemma. o 
In summary, the first-order calculation gives us the following theorem. 



2(1 + P)' 



Theorem 11 Consider a coherent Rayleigh-fading channel H60\) . where H is unit complex Gaussian random 
variable. The sequence of input distributions of the channel is constrained by Tw c {P)- Let R{\/W c ) be the 
maximum rate at which information can be transmitted on this channel, for a given error-exponent constraint 



E(R,P,W C ) > z, 0<z<z*, 



where z* is defined by A27t . We have 



R(0) = lim R(l/W c 

Wc^OO 



z 1 



SUp h 

o<p<i P T c 



In (l + % 



pPT c 
+P 



(77) 



Next we present a sufficient condition for a sequence of input distributions q\y c (x) to be first order optimal. 
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Lemma 15 Assuming < z < z* , where z* is defined by H271 . a sufficient condition for {qw c } to be first-order 
optimal is that 

1 o* PT 

lim E (P,q Wc ,p*,W c ) = -ln(l + f— (78) 
where p* is the optimizing pfor KTfo . 

Proof: Similar to the proof of Lemma|6] o 
Similar to the AWGN channel, in the fading channel with large coherence bandwidth W c , it does not take 
much to be first-order optimal. We restrict ourselves to those vector input distributions which are i.i.d. in each 
dimension. We have the following lemma. 

Lemma 16 For i.i.d. input distributions, such that qw c (x) = n^ =1 g(x,i), a sufficient condition for {(/w c ( x )} £ 
Tw c {P) to be first-order optimal is that q(x) is symmetric around zero, i.e. 



Proof: See Appendix Q] o 
5.2 R(0) and second-order optimal condition 

To compute R(0), we first establish a relationship between i?(0) and the derivative of E a (P, p, W c ) with respect 
to 1/W C . 

Theorem 12 If as W c goes to infinity, for each p £ [0, 1] , the limit ofofW c [E (P, p, W c ) — E Q (P, p, oo)] exists, 
which we denote as E Q (P, p, oo) and is a continuous function in p, and further, 

W c [E (P, p, W c ) - E {P, p, oo)] -► E (P, p, oo) uniformly for all p G [0, 1], (79) 

R(0) can be determined as 



p* 



where p* is the optimizing p in \77\ . 



Proof: The uniform convergence in d79b tells us: for any e > 0, we can find W^ € \ such that for all W c > wt ] , 
we have 

W c [E (P, p, W c ) - E (P, p, oo)] - E (P, p, oo) <e, VpG[0,l]. (81) 
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In other words, we know 



E (P, p, W c ) < E Q (P, p, do) + T^MP, P, oo) + 4r> 



c 



Applying Corollary |2]here, we know that for W c > W t 



C I 



2 E (P,p,W c 
R = sup h 



j;<P<l P P 

z E (P, p,oo) + ±-E (P, p, 00) + ^- 
< sup h 



f<p<l p p 



i< P <l P P pW c W c z 

Assume p(W c ) is the optimizing p for sup^ <p<1 — ^ + + g ° ( ^ oo) . Since the optimization is over a 

compact interval, if E °( P 'P' oc ') _|_ j s continuous in p, the optimizing p must exist. However, the first-order 

calculation already gave us 

£ (P 5 p,oo) = im(l + ^), 
T c 1 + p 

which is a continuous function of p, and we are assuming here E Q (P, p, 00) is continuous in p, we must have 
E (P,p,oo) _|_ E (Pjp,oo) cont j nuous j n p as we ii. Thus, it is well justified to denote p(W c ) as the optimizing p here. 
Using this notation, we can further bound R(l/W c ) as follows 

pn/Mn . / - , ^o(P,p,oo) ) , E (P,p(W c ),^) eP 
R(l/W c ) < \ sup 1 > H h 



£< P <i P P j P(^ C )W C W c z 

m + MP^oo) + eP 



p(W c )W c W c z' 
If we define 32(0) = limsup^^ W C [R(1/W C ) - R(0)], we have 

R(0) < hmsup — — + — 

W c ->oo piW c ) Z 

E (P,p*, 00) | eP 
p* z 

Here we use the fact 

lim p(W c )^p* (82) 

and the assumption that E (P, p, 00) is a continuous function in p. The proof of d82l is similar to Appendix iDl 
Letting e goes to 0, we know 

m <Zo(P P *, 00) 



P 
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On the other hand, d8TT i also implies 



rfj-i /w s . z E (P, p, oo) + ^rE (P, p, oo) - 
R(l/W c ) > sup h 



p 



< P <i P P 



z E (P,p, oo) E (P,P, oo) eP 
> sup 1 h 



f<p<l P P PW C W c z 

> z | E (P,p*, oo) | E (P,p*, oo) eP 



p* p* p*W c W c z 

£ D (P,p*,oo) eP 



R(0) + 



p*W c W c z' 



Letting e — > 0, we have 



R{0) = liminf W c [i2(l/W c ) - R(0)\ > E °^P^°°) 



W^oc ' ' J - p* 



Next we verify the uniform convergence assumption needed in Theorem [121 
Lemma 17 As W c goes to infinity, we have 

W c [E (P, p, W e ) - E (P, p, oo)] - ~ {l + p){l P ^ 2 p + pPTc)2 uniformly for p G [0, 1]. 
Proof: To show the uniform convergence result, we find both an upper bound and a lower bound for 

W c [E (Pp,W c )-E (P,p,^)} 

and both bounds converges uniformly to — n^w^p^pr yi ■ 

For notational convenience, we introduce the notation 5(ppr?r) which indicates a term satisfying 

lim W™8(-^)=0, uniformly for pe [0,1]. 

Using this notation, what we need to show here is 

E (P,p,W c ) < lln(l + ^)- f* + g( * ) ; 

T c l + p W c (l + p)(l + p + pPT c y W c 

E (Pp,W c ) > lln(l + ^)- — — - f* +S (-L). 

T c l + p W c (l + p)(l + p + pPT c y w c 

For the upper bound, we again use the inequality doTH ). which gives us 

I r,jpNF / P\H\ 2 \ 

£ (P,p,W c ) < --lnP H [e" Di?0 ( ^T' p) ]. 
34 



We showed that 



converges to — 2 (i+ p yi uniformly, or equivalently saying, for any e > 0, we 



can find £ > 0, such that for any p < £, 

„2 



PP 



PP 



e^<^ F (p,p)< 



1 + p 2(1 + p)* 



PP 



PP 



1 + p 2(1 +p)! 



+ epp 



Thus, we have 



E H [e 

> E H 

> E H 

> E H 
= Eh 

-Eh 

> E H 



2 ^ w c i 



p 



2 ^ we 



r^- » 1 i p/'- // ••/;. <p/ >2 // '/; 



i + 



2VF c (l + p) 3 



\H\ 2 < 



W£ 



i+p 



' pP 2 |g| 4 T c _ epP 2 \H\ 4 T c \ 
\ + 2W^ c (l + p) 3 W c J 

eWL ( P P 2 \H\*T C _ epP 2 \H\*T c \ 
{ + 2W c {l + pf W c J 

P P 2 \H\*T C e P P 2 \H\ 4 Tc 



pPT c \H\ l 
i+P 



pDj 

-e x +p E u 



1 + 
1 + 



2W C (1 + p) 3 VF C 
pP 2 |tf| 4 T c e P P 2 \H\*T c 



2VF c (l + p)^ 



W c 



1 + 



pPT c 
1+P 



i e h-p 1 + 



W c (l + p)^ 



2epP 2 T c \ 



Thus, 



£ (P,P,W C ) 



1 

< — 



t^t In < 
Pc I 



T c 1 + p 



1 (qj^-H^ 



_£££ 

- e 



( 1 PP 2 T C 2epP 2 T c \ 
\ + W c (l+p) 3 W c J 



T c I 



T c 1 1 + p 



( 1 + ®) 2 



1 1 + P J ^ ^W c (l + pf W c 



£) _ 1 l n ) i + 

1 + P* T c 1 



pP 2 T c 2epP 2 T c 

(1 + p)(l + p + pPT c ) 2 Wc (i + £PIk)2Wc 
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-(1 + ^)e"^ 1 + pP2Tc - ) 

{ + i + p } { l+ w c (i + P f W c )]■ 

Since we can choose an arbitrary small e here, it is straightforward to show that the term 



2epP 2 T c 



- (l + £ -)e i+p 1 + 



pP 2 T c 2epP 2 T c 



is actually <5( pj^;)- Thus, we have 



P (P,p,VF c ) 
= ^ln(l + ^) 



(i + p)(i + p + pPt c ) 2 w/ c + V, 



P p 2 



l + p^ (l+p)(l+p + pPT c ) 2 l^ c V VF c y ' 

For the lower bound, we again use the QPSK calculation: 

E (P,P,W C ) > E (P,QPSK,p,W c ) = -±\nE H [eM-DE? F (^^,QPSK,p)}}. 



In last section, we have already shown that 

E? F (p,QPSK,p) __ l 



PP 



1+P 



-, uniformly for p G [0,1]. 



P 2(l + p) 3: 

Equivalently, for any e > 0, we can find £ > such that for all p G [0, 1], and all p < £, 

- epp < P c (p, QPSK, p) < ^— - — + epp . 



1 + p 2(1 + p) 3 
Thus, we have 

-DE? F {?g£,QPSK,p) 



1 + p 2(1 + p)* 



Ptf[e" 



= ^[ e - D ^ F (T'« P5X ^) 



|tf| 2 < + ^ [e -^'(^,0P5^) 



|P| 2 > ^1 



n( pp\h\ 2 P P 2 \H\ 4 r P 2 \n\ 

U \W c (l+p) 2M/ c 2 (l+p)3 i^" 



pP\H\ 2 T c pP 2 \H\ 4 T c P 2 \H\ 4 T C 
= E H [e 1+P 2^(HP) 3 p W c 



\H\ 2 < -^}+e- — 



A useful inequality we can use here is the following 



e £ < 1 + t + tV Vt £ P. 
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To show the validity of dSTl . we check d87t for two cases: t > 1 and t < 1. When t > 1, (l87b is trivial. When 
t < 1, we start with the following well-known inequality: e~* > 1 — t. Since i < 1, this leads to 

1 l + t 



e l < 



1-t l-t 2 ' 



From here, it is easy to see that ( 1571 is true. 
Define 



2(1+P) ; 



+ ep. 



Applying (1871 in dSoT l. we have 



-DE" 1 " ( 



,QPSK,ph 



pp\h\*t c ( P 2 \H\ 4 T C 2 P 4 \H\ S T 2 P 2 W 4 T C 
< E H [e i+P 1 + 7? !— ! + ry 2 1 ' " " 



pP\H\ z T c 



< E H [e l +p [l + rj 



P 2 \H\ 4 T C 
W, 



W 2 



,„,-> W c £, Wet, 

\H\ 2 < —^} + e~— 



P 



+ E H [V 



P 4 \H\ S T? PP\H\ 2 T C | n P^H^T, 



W 2 



i+p 



i*i 2 <^] 



+e p . 



(88) 



For the second term in (188L since |i7 | 2 < we have 



1+p 

pPT c \H\ 2 



+ 



1 + p ' V2(l + p) 3 ' ' 



+ ep) Pr c |ff| 2 £. 



For sufficiently small e and £, (for example, e < 1 and £ < 1,) we have 



pPT c \H\ 2 P 2 \H\ A T C 

+ i] — — £ < 0. 



1 + P 



Thus, we can further bound (l88t as follows: 

•SK 

P 2 \H\*T C \. 



EH[e -DE^(^,QPSK, p)] 



P P\H\^T C I 

< E H [e l +p \l + ri 



< 



W c J 
1 + (1 + $ " 



,p 4 \h\ 8 t; 



+ 



1 + 



pPT e 

1+p 



+ 



pP^ 

+p 
p P 2 r e 



W 2 



\H\ 2 < 



e p 



1 



1 + ^ 



+2,\ 3 + < V, 



37 



Thus, 

E (P,P,W C 



1 . pPT c ^ 1 , / pP 2 T c e . 1 

- % ln(1 + it^ - t c ln 1 + (i + m + p + ^ w + ^ 



c 



T c n[+ l + p } (l + p )(l + p + p PT c yW c + [ w c h 
Thus, we have shown both (184b and (185b . From these two equations, it is easy to see the uniform convergence 
as claimed in LemmafTTl o 
Combining LemmafTTland Theorem El we have the following theorem. 



Theorem 13 Consider a coherent Rayleigh-fading vector channel \60\ . where H is a unit complex Gaussian 
random variable. Let R(l/W c ) be the maximum rate at which information can be transmitted on this chan- 
nel. The sequence of input distributions of the channel is constrained by J~w c (P) ■ F° r a gi ven error-exponent 
constraint 

E(R,P,W C ) > z, 0<z<z*, 

where z* is defined by i27t . we have 

R{0) = ~(l+p*)(l+p*+p*PT c ) 2 ' (89) 
where p* is the optimizing p in (177b . o 

Theorem 14 Both BPSK and QPSK are first-order optimal for any given z £ (0,2*); however, only QPSK is 
second-order optimal. 

Proof: The first-order optimality of BPSK and QPSK can be easily seen from Lemma ^] In the proof of 
Lemma[l7l we essentially showed that by choosing the input distribution of QPSK, 

E (P,QPSK,p,W c ) > — ln(l + ^-^)-- P — + <*(—). 

K ' V ' ~ T c V 1 + p' (l + p)(l + p + pPT c ) 2 W c K wJ 

On the other hand, it was also shown in the proof of LemmafTTlthat 

1 oPT oP 2 1 

MP, Qpsk, P , w c ) < e 4 p, p, w.) < ¥c Mi + ^ - {1 + p){l+ P p + pPTcrWe + K W )- 

Thus, we must have 

1 oPT oP 2 

W c [E (P,QPSK,p,W c )--Hl + ^—^)}^- / uniformly for pe [0,1]. 

t c i + p (i + + p + pPT c y 
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Following a similar argument as in Theorem^] we can easily obtain 

p2 



R(0) = R(Q) 



(1 +p*)(l + p* + p*PT c ) 2 ' 

For BPSK, using the result in last section regarding BPSK, we can obtain that 

oP 2oP 2 
W C [E (P, QPSK, p, W c ) - ln(l + -f—)} -> - / uniformly for p G [0, 1]. 

i + p (l + p){i + p + pPT c y 

Thus, 

2P 2 

(l + p*){i + p* + p*PT c y 

Therefore, QPSK is near optimal while BPSK is not. 



6 Conclusions 

In this paper, we have studied the maximum rate at which information transmission is possible in additive Gaus- 
sian noise channels and coherent fading channels, for a given error exponent in the wideband regime. Given a 
desired error exponent, our main contribution is the calculation of the above rate and its derivative in the limit 
when the available bandwidth goes to oo. For fading channels, we focus on the case when the coherence band- 
width W c is large. This also leads to a notion of near-optimality of input distributions, where a sequence of 
distributions is defined to be near-optimal if it achieves both the rate and its derivative in the infinite bandwidth 
limit. As in Fffl . we show that for both AWGN and coherent fading channels, while QPSK is near-optimal, 
BPSK is not. 

This result is surprising to some extent. Generally, it is not well-understood as to what signaling scheme is 
optimal, i.e., given a coding rate, it is difficult to find the input distribution that gives the smallest probability 
of decoding error. In this paper, we consider the problem from an alternate point of view, we fix a given error 
exponent, and consider optimal signaling schemes that gives the largest communication rate. The capacity- 
achieving schemes, which corresponds to zero error exponent, are not necessarily the best schemes from the 
error exponent point of view. However, the results in this paper tell us, in the wideband regime, QPSK is near- 
optimal with respect to a nonzero error exponent just as it is near-optimal for the capacity case for both AWGN 
and coherent fading channels. Thus, it can not only achieves capacity, but also achieves the the best probability 
of decoding error, in the wideband regime. 
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A The reliability function 



In this section, we will summarize some important bounds on the reliability function. To be consistent with other 
literature, we will use the traditional notation for the reliability function (as just a function of R) to present the 
bounds. Please note that elsewhere in this paper, the reliability function is defined as in (0. 

Definition 8 [4] Let P e (N, R) be the minimum probability of error for any block code of block length N and 
rate Rfor a given channel. The reliability function E{R) of this channel is defined as 

E{R) _ lim m 

o 

In El, Gallager provides an upper bound for the probability of error of discrete memory less channel 
(DMC). This result can be extended to a discrete-time memoryless channel with a continuous alphabet associated 
with an average power constraint, as stated in Theorem 10 of [3 1. 

Theorem 15 ^]|?^ Let f(y\x) be the transition probability density of a discrete-time memoryless channel and 
assume that each codeword is constrained to satisfy J2n=i \ x n\ 2 < NP. Then, for any block code with length N 
and rate R, there exists a code for which 

E{R) > E r (R), (91) 



with 



E r {R) = sup -pR + E (p) 

0<p<l 



E {p) = sup sup -In f ( f q{x)e l3 ( lxl2 ~ p )f(y\x)^'pdx] +P dy. (92) 

E x (\x\ 2 )<P P>0 J \J J 



We will refer to E r (R) as the random-coding exponent of the channel and (3 as the power-constraint parameter. 

To find a lower bound on the error probability (or equivalently, an upper bound on the reliability function) for 
a given channel is a much harder problem. In [2|, Fano derived the sphere-packing lower bound for a discrete- 
memoryless channel (DMC) in a heuristic manner. The first rigorous proof was provided by Shannon et. al. 
in flOl . In Q], a more intuitive and simpler proof was provided by Blahut by connecting the decoding error 
probability to a binary hypothesis-testing problem. The sphere-packing exponent E sp (R) coincides with the 
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random-coding exponent E r {R) for a rate larger than a critical rate R cr it, when the optimizing p equals to 1. 
Gallager also extended the lower bound result to a DMC with power constraint in [4 1 and noted that the random- 
coding exponent in this case also coincides with the sphere-packing exponent for R > R cr it- In a later work [5 1, 
he indicates that the lower bound is also applicable to a discrete-time, continuous channel with a finite, discrete 
set of input symbols and continuous output alphabet. 

Theorem 16 Consider a discrete-time memoryless channel with a discrete finite input alphabet {x±, x-i, ■ ■ ■ , xk} 
and the average input power is constrained by P. Let f(y\x) be the transition probability distribution. For any 
(N, R) code, we have 

E(R) < E sp (R), (93) 

with 

E sp (R) = sup -pi? + E (p), 

. ( K 1 \ 1+P 

E (p) = sup sup-In/ [y2q(xk)e^ Xk]2 - p> >f(y\x k )— P ) dy. (94) 

E x (\\xP)<P P>0 J Vfc=l / 

O 

As in |4j, using the Kuhn-Tucker conditions, we can derive a necessary and sufficient condition for q and f3 
to be optimal. 

Lemma 18 [4] A necessary and sufficient condition for q and (3 to optimize \94\ is 

J a(yYe^\ 2 - p ">f{y\x k )^dy > J a{y) l+ "dy, Vx fc (95) 
with equality if q{xk) > 0, where 

K 1 
a{y) = q(x k )e^ 2 - p ^f(y\x k ) — . (96) 

k=l 

Unfortunately, the sphere-packing result can not be applied to the case with an infinite number of input 
symbols. Thus, throughout this paper, we only consider input distributions with discrete and finite input alphabet. 
If we constrain the input distributions to be in D(P) as defined by Definition^ it is easy to see that the only 
difference between the random-coding exponent and the sphere-packing exponent is the range of p on which 
the optimization is performed. Thus, for R larger than the critical rate R cr iu where the optimizing p = 1, the 
random-coding exponent and sphere-packing exponent coincide with each other and give the true expression for 
the reliability function. 
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1/4 

z 



1/2-z 

Figure 6: The reliability function for AWGN channel with infinite bandwidth 

B Proof of Lemma El 

We prove this lemma by contradiction. Given an error exponent constraint z < |, assume that for any B z < oo, 
we can find B > B z , such that R(l/B) ^ R r (l/B). A direct consequence of this assumption is that we know 
the critical rate at bandwidth B, which we denote as R cr i t (l/B), satisfies 

E(R cnt (l/B)) < z. (97) 

For simplicity, in this proof, we assume P = 1. The infinite bandwidth reliability function of the AWGN channel 
is shown in Figure|6] Now we study the possible position of the point (R cr it(l/B), z cr i t {l/B)) in this figure. 

Since the error exponent for any given rate is a non-decreasing function of B, a trivial observation we can 
make right away is that the tuple (R cr i t (l/B), z cr it(l/B)) has to be below the infinite bandwidth reliability func- 
tion. Equation d97l further tells us that it can not be in region III. Now we argue that (R cr it(l/B), z cr it(l/B)) 
can not be in region II either. If the tuple is in region II, we know the linear part of the random-coding ex- 
ponent will intersect the infinite-bandwidth reliability function curve and thus for some communication rate, 
using a finite bandwidth B/2 is than using infinite bandwidth. This cannot be true and as a consequence, 
(Rcrit(l/B), z cr it(l/B)) can only be in region I, which is the shaded region. 

Next consider the random-coding exponent for rate 1/2 — z. It is straightforward to see that 

E r (l/2 - z,B) < z< E r (l/2 - z,oo). 
Combining this with our assumption, we know that the following equation can not be true: 

lim E r (l/2 - z,B) = E r (l/2 - z,oo). 

B— >oo 
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However, it is well known that for any rate between and capacity, the random-coding exponent converges 
to the infinite-bandwidth error exponent as the bandwidth increases to infinity. Thus, we have a contradiction. 



C Proof of Theorem |6l 

The error-exponent constraint gives us 



pz = sup -pr + E (p, p), 

0<p<l 



which is equivalent to say the following 

1 For any p G [0, 1], we always have 

pz > -pr + E {p,p). (98) 

2 For any e > 0, we can find p e , such that 

pz - e < -p € r + E (p, p e ). (99) 

Similarly, what we want show is equivalent to the following 

1 For any p G [0, 1], we always have 

r > - 

P P 

2 For any r\ > 0, we can find p v , such that 



r > - P - + E °^ P) . (100) 



pz E (p, p v ) 

r — rj < 1 —. (101) 

Pv Pv 

It is easy to see that ( 1 1001 ) follows directly from d9"8l l. Thus, it suffices to show dlOlt is true. To do this, first 
we construct an e from rj as follows 

PZT] (102) 



p — r + r\ 

First we check that e > 0. This is true if we have p > r. Note from the coding theorem, we know the largest rate 
available for reliable communication, which is defined as capacity, is equal to log(l + p) (nats per symbol) for 
AWGN channel. Hence, r < c = log(l + p) < p. 

From d99b . we know we could find a p e G [0, 1] such that 

r < g£ H Eo(p,Pe) | e ^ 

Pe Pe Pe' 
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Next we show — < n. 

Pe — ' 

From Lemma|4j we know from d99lfhat 



pz - e < p e r + - PPe < -p e r + pp e = (p — r)p e . 

1 + Pe 



Hence, we must have 

pz- e 

Pe > • 

p — r 

Thus, 

e_ e{p - r) 



Pe pz- e 

Use ([TTHl to get 

Pe 

In other words, for any rj > 0, we simply use p n = p e , and we will have (llOlt . which completes the proof of 
this theorem. 



D Proof of linip^o p(p) = P* 

We need to show that 



limp(p) = p* 

p—*0 



where p(p) is the optimizing p for the following equation 

/ x z E (p,p) P E o (0,p) 

Pip) = arg sup — + + 

0<p<l P P 2p 

and p* is defined as follows 

z 1 x fz 

p = arg sup 



0<p<l P 1 + P 1 



E o (0,p) 

p 

consequence of this assumption is that as p — > 0, 



The assumption we can use here is that °^ ' p ' is a continuous and bounded function in p for p G [0, 1]. A direct 



E o (0,p) P E o (0,p) E o (0,p) 

1 ► uniformly for p e 0,1. (103) 

p 2p p 

From the first-order calculation, we know that E o (0, p) = 

We prove lim p ^o p(p) = P* using a formal definition of the limit. For any eo > 0, we show that we can find 

5 > such that for all p < 5, we always have 

\p(p) - p*\ < e . 
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To see this, define 

e = (1 - Jzf - min(g(p* - e ),g(p* + e )), 



where 



g(p) 



z 1 

+ 



p 1+p 

Now we use (I103t here. For this e, we can find 5' > such that for any p G [0, 1] and for all p < 5' , such that 



E o (0,p) , p^ o (0,p) 1 



+ 



2p 1 + p 



< 



Thus, we have 



z E o (0,p) pE o (0,p) z 1 e 2 e 

sup — + + > sup h — - = {1-Vz) --. 

o<p<i P P *P o<p<i p 1 + p 2 2 



On the other hand, we also have 



z E o (0,p) pE o (0,p) 
sup 1 h 



0<p<l P P 



2p 



z | 1 | pE o (0,p(p)) 



p{p) 1 + p{jp) 2pip) 
pM 



< g(p(p)) + 



2 ' 



where M is the upper bound for Eo ^ p > for p £ [0, 1]. We choose 5 = min((5 / , jj), then for all p < 5, we have 
£^ < f. Further, 



From the definition of e, we must have 



\p{p) ~ P*\ < eo, 



which finishes the proof of this part. 



E Proof of Lemma M 



The first-order calculation gives us 



E o (0,p) 



1 + P 
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Thus, 



Eo(p,p) _ go(M 
VP P 

p 

pp 2 

- ln [e^+p mf {qp}& g (p) inf^o / a{y) 1+ Pdy 



pp 2 



In — ™ 1 



< 



el+P inf {9 P }eS(p) inf/j^o/aC?/) 1 ^^ 

" " 

eTT? lnf { gp }gg(p) lnf a>o J »fa) 1+p ^ 
pp 2 

PP 

-e~ mf {qp}e g (p) infp> J a(y) 1+p dy + 1 



9 PP 

^ inf {fc}6g(p) mf^o / a(y)i+Pdy 

< ~ e 1+p inf feKgw inf /3>o / a (y) +Pd y + 1 (104) 

pp 2 e~ 



— PP 

inf fe}ee( P ) inf /?>o / a (v) p(l y + e 1+p 



pp 2 



The inequality (I104t is true because Lemma|4]implies 

inf inf / a(y) 1+p dy = e - E ° M > e~^, 



{9p}66(p) 

which leads to 



{g P }eG(p) 

On the other hand, 



e i+" inf inf / a(y) l+p dy + 1 < 0. 



inf inf / a(y) 1+p dy < inf / a(y) 1+ ^y| /3=0 

inf / f^gfc/d/kfc) 1 ^ J 
{q P }eg( P )J VV / 

- inf - / [y2lkf(y\xk) dy 

{q P }&G{p)J \T / 



dy 



1. 



These two bounds together give us (1104 1) . 
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F Proof of Lemma ITT1 



First we check that 



f w (y)M\y)dy = E 



2Re(x 1 x*) 

/?'(|3i| 2 + |3 2 | 2 -2p) -fl(|si| 2 + |s a | 2 ) (1+p *y2 



(105) 



and thus 



/ f w {y)T 2 {y)dy 



f w (y)(M(y) - Ifdy 



E 



e /3*(ki| 2 + k2| 2 -2p) e -e(|z 1 | 2 + |x 2 | 2 ) I {1+p , ?2 _ : 



e /3*(M 2 - P ) e -0M 2 



Since E Q (p, q p ,p) > E Q (p, QPSK, p), and 

E (p,q P ,p) = - In / a(y) 1+ »dy < -(l + p)hxE [e^M 2 "^-^! 2 ] , 



we have 



E 



e /3*(M 2 -p) e -0M 2 



E (p,QPSK,p) 

< e !+p 



As we will show later, E °(P'Q^ Sk <P) converges to uniformly. In other words, we can write E a (p, QPSK, p) 
as + p5(p), where ^LL g OCS to zero uniformly for all p as p goes to 0. Thus, 



E e r(M 2 -P) e -%l 2 < e -(iT^+Tf^i 



pp p 



8(p) 



Note we should always have 



e P*(\x\ 2 -p) e -8\x\ 2 



< 1, 



for the optimizing (3*. This can be seen by the following sequence of inequalities: 



(E 



e r(M 2 -p) e -0M 2 



\i+p 



< inf / a 1+p dy 



< f a l+ Pdy 
J 13=0 

= J (j2<lkf(y\xk)^ dy 

< / ^2<lkf{y\xk)dy 
J k 

= 1. 
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Thus, 



E 



e (i*{\x\>~p) e -6\xf 



> l e -(TT^ + ^ (p) -l 



> < 



PP 



(i + pY l + p 
= ey + 5(p 2 ). 



P J(p) (j&p-frM 



2 ^ 2 



On the other hand, we have 



E 



> E 



> 



> 



> 



> 



> 



> 



2Re(x-^ J 



E 



3 /3*(ki| 2 +k2| 2 -2p) e -e(| a; i| 2 +|x 2 |2) e (1+p) / _ j 



o r(kl| 2 + k2| 2 -2p)„-e(|a;i| 2 +|x 2 | 2 )^£(^1^2)^ 

(1 + P) 4 

/3*(ki| 2 + |x 2 | 2 -2p) r -e(|x 1 | 2 + k 2 | 2 ) 2 t>lr x 2r + ^lc^lc + 2xi r Xi c X 2r X2 c ) 

(1+P) 4 



(1 + P) 4 

(E[eP*^\ 2 -P)e- e \^\ 2 (xl r + x\ c )}) 



(E[e^ 



(1 + P) 4 

(ki| 2 -p) e -eki| 2 | Xl |2 



(1 + P) 4 

(E[(i + /r(\x 1 \ 2 -p)-e\x 1 \ 2 )\x 1 \ 2 ]) 2 



H + pY 



( P -eE[\x^}) 2 

(i + p) 4 

(p-^ +2a f 
(1 + p) 4 

p 2 - 20i^p 2+2a 



P 



+ 5{p 2 ). 



(1 + P) 4 

In the above equations, xi r , X{ c denote the real part and imaginary part of the random variable Xj, i = 1,2. 
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G Proof of Lemma HU 

To prove Lemma [T2l we first establish two other lemmas. The first lemma shows that we can restrict ourselves 
to considering distributions which are symmetric around 0. Define T(q) = J a(y) 1+p dy. 

Lemma 19 Given any distribution q{x) E F(p), we can find a symmetric distribution q e (x) £ F(p), i-e-> 
q e (x) = q e (—x) Vx, such that T(q e ) < T(q). 

Proof: We first compute T(-) for q(—x) and show that it is the same as T(q). 

J (/ q(-x)eM x \ 2 -rif w (y-x)^dx^ +P dy 
= J (Jq(x)eM- x \ 2 -rtf w (y + x)^-pdx\ 1+P dy 
= J q{x)eW x \ 2 -rif w {-y + x)^dx\ +P dy 
= J f^J q{x)e^ x \ 2 -^f w {y-x)^dx"j +P dy 

= r( ? ). 

For p G [0, 1], it is easy to see that J a(y) 1+p dy is a convex function of q(x) for a fixed f3 . Thus if we choose 
q e (x) = \{q{x) + q(—x)), the power constraint will be still valid and we have 

T(q e (x))= J (| q e (x)e^ 2 -^Uy ~ x^dx^ dy <^(T(q) +T(q)) = T(q). 

The second lemma provides an upper bound for £ , [e /3 *d :r l 2 ~ f ')e _6 'l x 'l 2 |x| 2 ], which is a key term in the proof of 
Lemma [m 

Lemma 20 For any input distribution {q p } which has mean variance p, let j3* be the optimizing (3 as in \57\ . We 
must have 

^*(H a -p) e -»N a |x| 2 ]<pe*. (106) 

Proof: Denote 

h (P) = J (Zq k e^ M2 - p) f(y\x k )^) 1+P dy. 
If P* is the optimizing /?, applying the Kuch-Tucker condition here, we must have 

p*h'(p*) = 0, 
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which yields j3* = or 

J \ k I k 

which can be simplified as 

p J a(y) 1+p dy = J a( y y-y(y)dy. (107) 

Here we let 

«(y) = J2^ ilXkl2 - p) f(y\x k )^; 

k 

j(y) = Y,^'^ k? ~ p) f{y\^)M^k\ 2 . 
k 

If (3* = 0, (fT06l is trivial. 

If /5* > 0, we derive (fT06b using (fToTt . Note that 

[a( y y-f(y)dy > J Y.^* pM ~ p) f(y\ x *)^~n{y)dy 

k 

= Y.^ pM ~ p) Y.'li^^ l? ~ P) \ x i\ 2 I f(y\xk)^f(y\xi)^dy 

k I J 

l k 

l 

> Y.<li^^ Xl? ~ P) \xi\ 2 e-^e- e \^ 2 
i 

= e-^E[e^^ 2 -^e- e ^ 2 \x\ 2 }. 
On the other hand, as we have shown before, 

Ja(y) 1+ ?dy = inf J f(y\x k )^) ' +P dy < 1. 

Thus, we must have 

£[ e /3*(M 2 -P) e -0N 2 | x |2] < e e P J a (y)P 7 ( y ) dy = pe ep J a{y) 1+p dy < pe 8p . (108) 

o 

Now we prove LemmalT^l 

f l A>/)T 3 (y)dy = [f w (y)(M(y)-lfdy 

f w (y)(M 3 (y) ~ 3M 2 (y) + 3M(y) - 
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It is easy to check that 

f w (y)M(y)dy 
f w {y)M 2 {y)dy 

f w {y)M\y)dy 



E 
E 

E 



e /3*(Nil 2 -p) e -%i| 2 



2Re.{x\x<2 ) 



J*{\x^+\x 2 \^-2p) -0(|*i| 2 +|*a| 2 ) 



e ^(\xi\ 2 +\x 2 \ 2 +\^\ 2 -3p) e -e(\ xl \ 2 +\^\ 2 +\^\ 2 ) e 



2Re(x-^ ^2 +a; j +a?2 ^3 ) 

(i+p) 2 



where xi, X2 and x 3 are i.i.d. random variables with distribution {q p (x)}. Thus, after some manipulations, we 
have 

, 3 



f w (y)T 3 (y)dy = (e 



e P*(.\xi\ 2 -p) e -6\xi\ 2 



1 



-3E 



2 /3*(| :Cl | 2 +|z2| i -2p) e -e(|x 1 | 2 +|z 2 | 2 ) e (1+p)2 _ x 



e /9«(|xi| a +|x a | 2 +|xa| a -3p) e -«(|xx| a +|x 2 | a +|x 8 | 2 ) e "l^p 2 ~ 



- 1 



From the proof in Lemma[TT1 we know 

e -0p -i< E [ e /J*(l*i| a -P) e -«|xi| a ] _ 1 < 0, 

and thus, we must have 



E 



1 



< (1 _ e - tf P)3 < Q3 p 3_ 



^xp{(p*-e)(\ Xl \ 2 -p)}' 

On the other hand, we expand the second and third term in the RHS of dl09t as follows: 

E 

00 

= 

k=l 







e 









,r(|xi| 2 +|x 2 | 2 -2 P ) -e(\x 1 \ 2 +\x 2 \ 1 ) {^Re{xix* 2 )) k 

(l + p) 2k k\ 



and 



/ 2Re(x-\ x -\-x-i +3:0X0) 

3 /3*(|x 1 | 2 + |x 2 | 2 + |x 3 | 2 -3p) e(|x 1 | 2 + |x 2 | 2 + |x 3 | 2 ) _ ' ' ■- 



(i+p) 2 



„0n\x 1 \ 2 +\x 2 \ 2 Mx 3 \ 2 -Zv) r -e(\x 1 \*Mx2\ 2 Mxz\ 2 ) 2k ( R < x i x *2) + Re(x lX * 3 ) + i?e(x 2 4)) fc 



fe=l 
00 

E E * 



(109) 



(110) 



e r(Ni| 2 +|x 2 | 2 +|x3| 2 -3p) e -0(|xi| 2 +|x2| 2 +|x 3 | 2 ) 
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(1 + P ) 2k k\ 
2 ) 2 k C^ n Re(x 1 xl) l Re{x 1 x%rRe{x 2 x%Y 



{l + p) 2k k\ 



(k) 

where C Zmn is a non-negative constant independent of p. 

It is straightforward to check to following, using the above two expansions: 

2 R e ( x i x 2 + x ^ x ^ + x 2 x ^ J 



/9"(|*i| a +|*a| 2 +|x3| a -3p) P ^(|zi| 2 +M 2 + k 3 | 2 ) I 



- 1 



-3£ 



2Re(x\ x% ) 



e /3*(NI 2 +M 2 -2p) e -0(M 2 +|z 2 | 2 ) e (1+p) / _ j 



E E 

k=i 

l + m + n = k: 



E 



o p* { \ Xl \2 +lx2l 2 +lx3l 2_ 3p) 9{lxj | 2+| ^ |2+| ^ |2) 2 k C^ n Re(x 1 x* 2 ) l Re(x 1 x* 3 ) m Re(x 2 x* 3 y 

{l + p) 2k k\ 



l,m,n < k 

00 

+3E^ 



k=i 



c 0*(\x l \ 2 +\x 2 \ 2 -2p) c -e(\x l \ 2 +\x 2 \ 2 ) ( 2Re ( X l X 2)) 



{^(I*3| a -P) e -%3| 2 ]_ 1 }. 



(l+p) 2fe A;! 

Next, we bound the two terms above separately, using the bound that Re(z) < \z\. Note that for symmetric 
distributions, it is easy to see that all the k odd terms will vanish. Thus, we can remove the term with k = 1. 



E^ 

k= 

00 

< E 

k=2 

00 

< J2 E 



e P*(\x 1 \ 2 +\x 2 \ 2 -2 P ) e -9(\x 1 \ 2 +\x 2 \ 2 ) {1Re{xix* 2 )) k 



E 



k=2 



{l + p) 2k k\ 

{2Re( Xl x* 2 )) k 
(l + p) 2k k\ 

0"(\x 1 \ 2 +\x 2 \ 2 -2p)-e(\x 1 \ 2 +\x 2 \ 2 ) 2k \ x i\ k \ x 2\ k 



e /3*(|x 1 | 2 +|x 2 | 2 -2p) e -e(| a;i | 2 +| :C2 | 2 ) 



< ^Ett 



2 fe 

+ p) 2fc fe! 



(1 + /)) 2fc fc! 
2 



{ E[e ^(NI 2 - P ) e -% 3 | 2 ] _ !} 
^^(NP-p)^! 2 ].!} 

(1 - e" ep ) 



e r(M 2 - P ) e -eM 2 | Xl | 



A 2 (l + p) 2 ^! 

< A9pe 2K ™ (E 

< 4ee 29p e 2K ™ P 3 . 



e ^(Nil 2 -p) e -eki| 2 i a , 1 |2 



e /3*(Ni| 2 -P) e -ek 1 | 2 | Xi |2 



Similarly, for the other term, we can also remove the term where k is odd. Actually, we can do more. For 
example, when k = 2, since at least two of I, m, n are required to be non-zero, we must have two of them are 1, 
while the other is 0. It can be easily seen the contribution of this term is also zero, for symmetric distributions. 
Thus, we remove the terms for both k = 1 and k = 2. 



E E 

k=i 

l + m + n = k; 
l,m,n < k 



E 



(1 + p) 2k k\ 
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E 

l + m + n = k; 
l,m,n < k 

E 

I + m + n = k; 
l,m,n < k 

E[ e W x 2-rte- e ^ 2 \x 2 \ l+n ] *E 
00 2 k K 2k ~ e 



e /3^|x 1 | 2 +|x 2 | 2 + | a; 3| 2 -3p) e -e(|x 1 | 2 + |x 2 | 2 +|x3| 2 )| Xi |«+m| X2 |m+n| a , 3 | 



E 

k=3 



(l+p) 2k k\ 



e l3*(\ Xl \*-p) e -e\xtf<Al+m 



oP*{\x 3 \ 2 -p) e -e\x 3 \ 2 i x ,m+n 



E C lmn ( E 

I + m + n = k; 

Lm,n < k 



g^dxi^-pjg-eix!! 2 ! ,2' 



00 R fe 7^2fc-6 . 



3 /3*(|x 1 | 2 -p) e -e| a; i| 2 | Xi |2 



< 216e 6 ^e 3 V- 



Combining all these bounds, we have 



f w (y)T 3 (y)dy 



< 9 3 p 3 + 126e 2ep e 2K ™p 3 + 216e 6i ^e 3e V 

< Ce 3dp p 3 , 



where C is a constant, which is independent of p and independent of the choice of input distributions, as far as it 
is in ZF(p). 

H Proof of Lemma |7l 



To show this, we need to check ( PHfl ) for a sequence of mean-zero input distribution q p G F{p) ■ Since it is always 
true that 

limsup Eo{p, qp ,P*) < P* 



P 



1 + p* 



it suffices to show that 



Hminf M^) > ^ 



P 



1 + /)* 
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Note 



Eo{p,q P ,P*) = sup -In / a{y) 1+p *dy 

/3>0 



sup -In f f w (y)(l+T{y)) 1+ P*dy. 

0>O J 



To achieve a lower bound, we choose (3 = 6= j^—? . Further, we use the following inequality 



This leads to 



(1 + t) 1+ ^ < 1 + (1 + P*)t + P^±ll t \ 



E (p,q P ,P*) > - In / f w (y)(l + (1 + p*)T(y) + P * (1 + P * ] T 2 {y))dy. 



When f3 = 9, it can be shown that 

J f w (y)(l + (1 + p*)T{y))dy = -p* + (1 + p*)e^ 



and 



/ f w (y)T 2 (y)dy = l-2e- 6 f + E 



e (i+p*) 2 



-20p 



where a?i and X2 are i.i.d random variables distributed according to q p (x). 
Next we claim 

Um / f w (y)T 2 (y)dy = Q 

p^O 



P 



Since lim. 



(l_ e -flp)2 
p^O p 



0, it suffices to show 





2Re(x^ x^ ) 




E 


e (i+p*) 2 


- 1 


P 



0. 



lim 

p^O 

Using the assumption that q p (x) is symmetric around and 

I x\ max *~ K-mP > 

we can show this following a similar procedure as in the proof of Lemma[l2l 
Thus, we have 

Um fafgfe "-"•> > liminf -M-p- + (l+/)e- a - + oW) 



p^O 



P^O J> 

In(l - A + o(p)) 



lim inf 

p^O 

1 + p*' 



P 
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I BPSK and QPSK for AWGN channels 

Since for both BPSK and QPSK, we have \x\ 2 = p with probability 1, the power constraint parameter (3 does not 
play a role here and E (p, q p , p) can be simplified to 



E (p,q P ,p) = ~ ln / a (y) 1+f>d y> 



with 

"(y) = / q P {x)f w {y\x)~pdx. 



Again, we use the two inequalities which have been very helpful to us in the general first and second order 
calculations: 



(l+ty+p < i + (i + P )t + rv 2 r ' t 2 - cm) 

+ > l + (l + p)t + ^±^t 2 - P(1 + f l ~ p) t\ (112) 

— ■ D 

We write J a(y)dy as follows 

J a(y)dy = J f w (y)(l + T(y)) 1+ f>dy, (113) 

where T(y) denotes 

T(y) = pH7w)J ~~ 

It is easy to check for BPSK or QPSK, we have 

J f w (y)T(y)dy = eT^-l; 

J f w (y)T 2 (y)dy = (e^ - l) 2 + e' 2 ^ E[e~l^ - 1}- 

/2Re{x\ ) + 2i?e(a; ^ )-\-2Re(x2 ) 2Re(x^X2 ) 

f w (y)T 3 (y)dy = (e' 9p - if + e~ 3ep £[e O+ri 3 ~ - 1] - 2e~ 29p E[e d+^) 2 - 1]. 

Further, for BPSK, we can calculate that 

£[e (i+rt 3 ] = - eOW 7 + e 0^ - 2 = 1 + - + <5(p 2 ). (114) 
2 V / (1 + p) 



and 



2Re(x 1 x^) n 2 

E[e^^] = l + -fP+5{p 2 )- 



2Re(x 1 x*) + 2Re(x 1 x*) + 2Re(x 2 xp f, 2 
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(1 + P) 4 
6p 2 



which further yield an upper bound and lower bound for J a(y)dy, 

J a (y)dy < 1 + (1 + p) J Uy)T{y)dy + P^±A J f w ( y ) T \y)dy 

= 1 + (1 + p){e* - 1) + | (e-* - I) 2 + + 5(p 2 

< i + (i + P)(-^ + ^) + ^{^V + ir f^ + ^ 2 )} 



-i/y !./// > 1 + (l + p) / U{y)T{y)dy + ^^- / f w (y)T 2 (y)dy - p{1 + P) / f w {y)T 6 {y)dy 



l + p r (1 + p) 3 2 
1 + (1 + p)( e -* - 1) + |(e-^ - I) 2 + + 5(p 2 )} + p5(p*) 



?V 3 P\ , P(l + P) f, fl ^VV 2p 2 

1 6- ) + ^— {(-«P + -2") +(TT- p) 4 



> 1 + (1 + p){-6p + ^-y ) + < (-«/> + ~~7~ r + 77~~~777 + S(p 2 ) } + ,»>V ) 

P _ , P 3 + P 2 + 2pp 2 , ^ 2a 



- ' i + / + (i + P )3 T + * 

In other words, we must have 



Thus, 



/P p 3 + p 2 — (— 2 yO p 2 

a(y)dy = 1 - — p + (1 + p)3 y + ^(p 2 )- 

, ^ , E (p,BPSK,p) 

r(p) = sup 1 

0<p<l P P 

pz -In / a(y)dy 
= sup 1 

0<p<l P P 

P z P P 2 <-/ 2\ 

= sup + — + J(p )■ 

0< P <1 p 1 + p (l + p) d 



From here, it is easy to check that 

E (p,BPSK,p) 1 



pp 1 + p 

uniformly for < p < 1 as p — > 0. Further, 

E (p,BPSK,p) l_ 

PP 1+P >. _ 



P (1 + P) 3 ' 

From Theorem and Theorem [8] we know this implies 

f(0) = SUP -- + -j— = (1 _ ^1)2; 

0<P<1 P 1 + P 



E o (0,BPSK,p* 

~(i + p*y- 
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, (0) _ ^u,^, y; = __, = _ 2(1 _^,, 



Therefore, BPSK is first-order optimal but not second-order optimal. 

The QPSK calculations are very similar to the BPSK calculations and we can show that for QPSK 

1 



f(0) = 
f(0) = 

which implies that QPSK is near-optimal. 



sup h 

0<p<l P l + p 

e (o,qpsk, p *) 

p* 



(l - V~z) 2 ; 
-(i-V^f, 



J Proof of Lemma [16] 

It suffices to check d78l for this choice of input distributions. When qw c has i.i.d. entries, we have the following: 



E (P,q Wc ,p\W c ) > E o (P,q Wc ,p*,W c )\p =0 



lnE H 



P\H\ 2 



y) 1/3=0} 



where 6 = -jj+^y- Following Appendix IH1. we know that if q is symmetric around 0, we have 



liminf M^W)|fc.> f 



p 



l + p* 



From Lemma |4] 



Er(p,q,p*)y= e < P * 



Thus, actually, if we take [5 = 6, the limit of 
This result also implies 



l + p* 

exists and is equal to j^— f 



lim DEfr&f-Mp*)^ 



p*P\H\ 

l + p* 



a.e. for \H\ 2 E R+. 



On the other hand, since E^ F ( P ^ ,q, p*)\p = g > 0, we know 

.P\H\ 2 



exp{-DE^(^^, q ,p*)y =e } < 1- 
Thus, we can apply dominated convergence theorem to (11151) and we have 



lwiME (P,q Wc ,p*,W c ) > lim -—]nE H 

W c — »oo Wc— >°o J c 



Y 1uEh 



eM-DE^ F ( 
lim exp{-D£^ F ( 



W C -KX> 



P\H\ 2 
P\H\ 2 



,q,P*)\p=e} 
,q,P*)\p=o} 



57 



(115) 



(116) 



= --\nE H 
1 c 

= — n(l + - -). 

T c v 1 + p* ' 

Thus, (I78t holds for this choice of input distributions. However, there is a little subtlety in applying the results 
in AWGN case here, since the p* in AWGN case and the p* in this paper are different. This can be easily resolved 
by observing that the inequality (II 1 61) . which we borrowed from Appendix IhI is actually true for any fixed p. 
Thus we can choose p* to be the optimizing p for d77t and hence the proof. 
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